NSF PAR Search | NSF Public Access Repository

Can machine learning predict late seizures after intracerebral hemorrhages? Evidence from real-world data

Lekoubou, Alain; Petucci, Justin; Ajala, Temitope Femi; Katoch, Avnish; Hong, Jinpyo; Bonilha, Leonardo; Chinchilli, Vernon M; Honavar, Vasant G (August 2024, Epilepsy Behavior)

Introduction Intracerebral hemorrhage represents 15 % of all strokes and it is associated with a high risk of post-stroke epilepsy. However, there are no reliable methods to accurately predict those at higher risk for developing seizures despite their importance in planning treatments, allocating resources, and advancing post-stroke seizure research. Existing risk models have limitations and have not taken advantage of readily available real-world data and artificial intelligence. This study aims to evaluate the performance of Machine-learning-based models to predict post-stroke seizures at 1 year and 5 years after an intracerebral hemorrhage in unselected patients across multiple healthcare organizations. Design/methods We identified patients with intracerebral hemorrhage (ICH) without a prior diagnosis of seizures from 2015 until inception (11/01/22) in the TriNetX Diamond Network, using the International Classification of Diseases, Tenth Revision (ICD-10) I61 (I61.0, I61.1, I61.2, I61.3, I61.4, I61.5, I61.6, I61.8, and I61.9). The outcome of interest was any ICD-10 diagnosis of seizures (G40/G41) at 1 year and 5 years following the first occurrence of the diagnosis of intracerebral hemorrhage. We applied a conventional logistic regression and a Light Gradient Boosted Machine (LGBM) algorithm, and the performance of the model was assessed using the area under the receiver operating characteristics (AUROC), the area under the precision-recall curve (AUPRC), the F1 statistic, model accuracy, balanced-accuracy, precision, and recall, with and without seizure medication use in the models. Results A total of 85,679 patients had an ICD-10 code of intracerebral hemorrhage and no prior diagnosis of seizures, constituting our study cohort. Seizures were present in 4.57 % and 6.27 % of patients within 1 and 5 years after ICH, respectively. At 1-year, the AUROC, AUPRC, F1 statistic, accuracy, balanced-accuracy, precision, and recall were respectively 0.7051 (standard error: 0.0132), 0.1143 (0.0068), 0.1479 (0.0055), 0.6708 (0.0076), 0.6491 (0.0114), 0.0839 (0.0032), and 0.6253 (0.0216). Corresponding metrics at 5 years were 0.694 (0.009), 0.1431 (0.0039), 0.1859 (0.0064), 0.6603 (0.0059), 0.6408 (0.0119), 0.1094 (0.0037) and 0.6186 (0.0264). These numerical values indicate that the statistical models fit the data very well.

Full Text Available

One-third of protein domains in the CATH database contain a recently discovered tertiary topological motif: non-covalent lasso entanglements, in which a segment of the protein backbone forms a loop closed by non-covalent interactions between residues and is threaded one or more times by the N- or C-terminal backbone segment. Unknown is how frequently this structural motif appears across the proteomes of organisms. And the correlation of these motifs with various classes of protein function and biological processes have not been quantified. Here, using a combination of protein crystal structures, AlphaFold2 predictions, and Gene Ontology terms we show that in E. coli, S. cerevisiae and H. sapiens that 71%, 52% and 49% of globular proteins contain one-or-more non-covalent lasso entanglements in their native fold, and that some of these are highly complex with multiple threading events. Further, proteins containing these tertiary motifs are consistently enriched in certain functions and biological processes across these organisms and depleted in others, strongly indicating an influence of evolutionary selection pressures acting positively and negatively on the distribution of these motifs. Together, these results demonstrate that non-covalent lasso entanglements are widespread and indicate they may be extensively utilized for protein function and subcellular processes, thus impacting phenotype.

Search for: All records